Overview

Dataset statistics

Number of variables10
Number of observations9146
Missing cells0
Missing cells (%)0.0%
Duplicate rows93
Duplicate rows (%)1.0%
Total size in memory714.7 KiB
Average record size in memory80.0 B

Variable types

NUM10

Reproduction

Analysis started2021-04-26 13:39:10.210271
Analysis finished2021-04-26 13:40:04.264410
Duration54.05 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 93 (1.0%) duplicate rows Duplicates
size_npear is highly correlated with mass_npea and 2 other fieldsHigh correlation
mass_npea is highly correlated with size_npear and 4 other fieldsHigh correlation
damage_size is highly correlated with mass_npea and 2 other fieldsHigh correlation
exposed_area is highly correlated with mass_npea and 4 other fieldsHigh correlation
std_dev_malign is highly correlated with mass_npea and 3 other fieldsHigh correlation
damage_ratio is highly correlated with mass_npea and 1 other fieldsHigh correlation

Variables

mass_npea
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count8847
Unique (%)96.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9903.052173627817
Minimum2864.76
Maximum36995.4
Zeros0
Zeros (%)0.0%
Memory size71.6 KiB

Quantile statistics

Minimum2864.76
5-th percentile4887.2075
Q16988.42
median8895.965
Q312119.95
95-th percentile17593.025
Maximum36995.4
Range34130.64
Interquartile range (IQR)5131.53

Descriptive statistics

Standard deviation4060.577116
Coefficient of variation (CV)0.4100328913
Kurtosis1.343576214
Mean9903.052174
Median Absolute Deviation (MAD)2361.61
Skewness1.09779161
Sum90573315.18
Variance16488286.51
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4000.2650.1%
 
12958.44< 0.1%
 
13475.44< 0.1%
 
12311.83< 0.1%
 
11887.83< 0.1%
 
7034.323< 0.1%
 
7379.323< 0.1%
 
13004.33< 0.1%
 
8465.783< 0.1%
 
5811.823< 0.1%
 
10605.33< 0.1%
 
6411.313< 0.1%
 
9929.733< 0.1%
 
5189.072< 0.1%
 
10016.12< 0.1%
 
10526.12< 0.1%
 
10146.42< 0.1%
 
6894.152< 0.1%
 
19280.72< 0.1%
 
10314.22< 0.1%
 
16876.82< 0.1%
 
8123.232< 0.1%
 
11954.72< 0.1%
 
7725.482< 0.1%
 
8791.922< 0.1%
 
Other values (8822)907999.3%
 
ValueCountFrequency (%) 
2864.761< 0.1%
 
3114.982< 0.1%
 
3124.091< 0.1%
 
3281.671< 0.1%
 
3328.181< 0.1%
 
3335.61< 0.1%
 
3338.361< 0.1%
 
3342.181< 0.1%
 
3344.431< 0.1%
 
3348.791< 0.1%
 
ValueCountFrequency (%) 
36995.41< 0.1%
 
30970.41< 0.1%
 
29601.91< 0.1%
 
29078.61< 0.1%
 
28960.11< 0.1%
 
28404.91< 0.1%
 
28001.71< 0.1%
 
27827.11< 0.1%
 
27546.41< 0.1%
 
27446.61< 0.1%
 

size_npear
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count8859
Unique (%)96.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3032.8278373059266
Minimum510.53
Maximum13535.0
Zeros0
Zeros (%)0.0%
Memory size71.6 KiB

Quantile statistics

Minimum510.53
5-th percentile1229.72
Q11983.6575
median2684.33
Q33830.745
95-th percentile5816.2175
Maximum13535
Range13024.47
Interquartile range (IQR)1847.0875

Descriptive statistics

Standard deviation1462.334147
Coefficient of variation (CV)0.4821685321
Kurtosis1.887115716
Mean3032.827837
Median Absolute Deviation (MAD)842.23
Skewness1.169529805
Sum27738243.4
Variance2138421.156
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1087.1350.1%
 
3493.284< 0.1%
 
2025.163< 0.1%
 
2470.773< 0.1%
 
2004.673< 0.1%
 
1053.233< 0.1%
 
1541.523< 0.1%
 
2061.53< 0.1%
 
6172.933< 0.1%
 
2533.63< 0.1%
 
2034.783< 0.1%
 
43313< 0.1%
 
3102.873< 0.1%
 
2238.053< 0.1%
 
4814.933< 0.1%
 
4099.993< 0.1%
 
2329.122< 0.1%
 
2592.092< 0.1%
 
5414.722< 0.1%
 
1614.482< 0.1%
 
2292.142< 0.1%
 
3787.252< 0.1%
 
2422.52< 0.1%
 
3028.792< 0.1%
 
1799.52< 0.1%
 
Other values (8834)907799.2%
 
ValueCountFrequency (%) 
510.531< 0.1%
 
520.331< 0.1%
 
572.71< 0.1%
 
577.81< 0.1%
 
601.211< 0.1%
 
623.421< 0.1%
 
631.171< 0.1%
 
673.611< 0.1%
 
678.821< 0.1%
 
681.591< 0.1%
 
ValueCountFrequency (%) 
135351< 0.1%
 
12491.11< 0.1%
 
109351< 0.1%
 
10688.21< 0.1%
 
10658.91< 0.1%
 
10441.21< 0.1%
 
10282.11< 0.1%
 
10263.91< 0.1%
 
9964.751< 0.1%
 
9885.561< 0.1%
 

malign_ratio
Real number (ℝ≥0)

Distinct count7386
Unique (%)80.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.30308287557402147
Minimum0.11482
Maximum0.5253
Zeros0
Zeros (%)0.0%
Memory size71.6 KiB

Quantile statistics

Minimum0.11482
5-th percentile0.2050725
Q10.2590525
median0.301055
Q30.3430025
95-th percentile0.4117525
Maximum0.5253
Range0.41048
Interquartile range (IQR)0.08395

Descriptive statistics

Standard deviation0.06253294702
Coefficient of variation (CV)0.2063229303
Kurtosis0.01478203704
Mean0.3030828756
Median Absolute Deviation (MAD)0.04198
Skewness0.2322996952
Sum2771.99598
Variance0.003910369464
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.2695770.1%
 
0.2662860.1%
 
0.2840850.1%
 
0.3215650.1%
 
0.2738250.1%
 
0.2565350.1%
 
0.2437250.1%
 
0.2527150.1%
 
0.2717650.1%
 
0.321750.1%
 
0.270374< 0.1%
 
0.255514< 0.1%
 
0.339824< 0.1%
 
0.267214< 0.1%
 
0.323674< 0.1%
 
0.310734< 0.1%
 
0.315374< 0.1%
 
0.326134< 0.1%
 
0.315984< 0.1%
 
0.348594< 0.1%
 
0.321544< 0.1%
 
0.265914< 0.1%
 
0.309644< 0.1%
 
0.2594< 0.1%
 
0.321554< 0.1%
 
Other values (7361)903398.8%
 
ValueCountFrequency (%) 
0.114821< 0.1%
 
0.121611< 0.1%
 
0.124411< 0.1%
 
0.1261< 0.1%
 
0.128151< 0.1%
 
0.128611< 0.1%
 
0.129751< 0.1%
 
0.130241< 0.1%
 
0.130891< 0.1%
 
0.135121< 0.1%
 
ValueCountFrequency (%) 
0.52531< 0.1%
 
0.524281< 0.1%
 
0.523741< 0.1%
 
0.519921< 0.1%
 
0.518731< 0.1%
 
0.518211< 0.1%
 
0.515691< 0.1%
 
0.513961< 0.1%
 
0.513791< 0.1%
 
0.511231< 0.1%
 

damage_size
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count8861
Unique (%)96.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean103.90211786573366
Minimum10.3101
Maximum346.42
Zeros0
Zeros (%)0.0%
Memory size71.6 KiB

Quantile statistics

Minimum10.3101
5-th percentile40.462475
Q164.012525
median88.4583
Q3134.209
95-th percentile207.808
Maximum346.42
Range336.1099
Interquartile range (IQR)70.196475

Descriptive statistics

Standard deviation55.45686176
Coefficient of variation (CV)0.5337413991
Kurtosis1.383014273
Mean103.9021179
Median Absolute Deviation (MAD)29.2055
Skewness1.223522874
Sum950288.77
Variance3075.463516
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
33.73250.1%
 
162.5944< 0.1%
 
93.90393< 0.1%
 
184.3613< 0.1%
 
84.19883< 0.1%
 
51.67413< 0.1%
 
150.4933< 0.1%
 
49.86483< 0.1%
 
168.553< 0.1%
 
44.71483< 0.1%
 
90.62293< 0.1%
 
76.43923< 0.1%
 
207.5053< 0.1%
 
73.66823< 0.1%
 
52.55913< 0.1%
 
67.43582< 0.1%
 
60.28292< 0.1%
 
160.3182< 0.1%
 
177.662< 0.1%
 
70.33612< 0.1%
 
67.51952< 0.1%
 
236.0842< 0.1%
 
154.492< 0.1%
 
179.6522< 0.1%
 
115.4112< 0.1%
 
Other values (8836)907899.3%
 
ValueCountFrequency (%) 
10.31011< 0.1%
 
10.68911< 0.1%
 
11.55761< 0.1%
 
16.70261< 0.1%
 
19.23051< 0.1%
 
20.85941< 0.1%
 
21.37941< 0.1%
 
22.04971< 0.1%
 
22.26131< 0.1%
 
23.1121< 0.1%
 
ValueCountFrequency (%) 
346.421< 0.1%
 
344.3461< 0.1%
 
344.0931< 0.1%
 
342.2712< 0.1%
 
340.4221< 0.1%
 
340.0621< 0.1%
 
337.9311< 0.1%
 
337.5051< 0.1%
 
336.4781< 0.1%
 
335.5321< 0.1%
 

exposed_area
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count8949
Unique (%)97.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1372441.9647981194
Minimum387853.4025
Maximum4978616.0418
Zeros0
Zeros (%)0.0%
Memory size71.6 KiB

Quantile statistics

Minimum387853.4025
5-th percentile673233.7165
Q1959687.2643
median1237057.06
Q31693083.245
95-th percentile2448493.426
Maximum4978616.042
Range4590762.639
Interquartile range (IQR)733395.9805

Descriptive statistics

Standard deviation564677.287
Coefficient of variation (CV)0.4114398288
Kurtosis1.208698938
Mean1372441.965
Median Absolute Deviation (MAD)336057.5122
Skewness1.064545725
Sum1.255235421e+10
Variance3.188604385e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
569494.389250.1%
 
1798705.2034< 0.1%
 
799977.15393< 0.1%
 
1186263.5713< 0.1%
 
963134.97933< 0.1%
 
831886.27663< 0.1%
 
1877843.5473< 0.1%
 
1502130.3723< 0.1%
 
1106670.1172< 0.1%
 
820445.05052< 0.1%
 
976360.26242< 0.1%
 
2215735.4222< 0.1%
 
885777.19152< 0.1%
 
1438462.4892< 0.1%
 
1851013.6082< 0.1%
 
2931521.3592< 0.1%
 
959554.85862< 0.1%
 
1702364.2632< 0.1%
 
2346247.8282< 0.1%
 
1217529.7162< 0.1%
 
732592.63122< 0.1%
 
1178899.8612< 0.1%
 
1297715.0572< 0.1%
 
2064925.2272< 0.1%
 
1615553.8142< 0.1%
 
Other values (8924)908599.3%
 
ValueCountFrequency (%) 
387853.40251< 0.1%
 
423115.49732< 0.1%
 
427736.53291< 0.1%
 
452984.94831< 0.1%
 
453815.27341< 0.1%
 
454396.27951< 0.1%
 
456432.76191< 0.1%
 
457690.34961< 0.1%
 
457924.30261< 0.1%
 
458243.52961< 0.1%
 
ValueCountFrequency (%) 
4978616.0421< 0.1%
 
4256876.1161< 0.1%
 
4172679.4981< 0.1%
 
3981595.2161< 0.1%
 
3864242.8251< 0.1%
 
3853609.2341< 0.1%
 
3833944.9251< 0.1%
 
3807277.9671< 0.1%
 
3784682.8681< 0.1%
 
3759338.5071< 0.1%
 

std_dev_malign
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count8802
Unique (%)96.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean146.30423893505358
Minimum31.9704
Maximum528.89
Zeros0
Zeros (%)0.0%
Memory size71.6 KiB

Quantile statistics

Minimum31.9704
5-th percentile62.781925
Q195.8539
median126.1385
Q3182.2515
95-th percentile278.92425
Maximum528.89
Range496.9196
Interquartile range (IQR)86.3976

Descriptive statistics

Standard deviation70.51217743
Coefficient of variation (CV)0.4819558062
Kurtosis1.23506292
Mean146.3042389
Median Absolute Deviation (MAD)38.4627
Skewness1.151081569
Sum1338098.569
Variance4971.967166
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
46.03950.1%
 
211.9354< 0.1%
 
114.3533< 0.1%
 
90.16263< 0.1%
 
137.353< 0.1%
 
178.9873< 0.1%
 
112.8733< 0.1%
 
198.9593< 0.1%
 
111.3013< 0.1%
 
140.5073< 0.1%
 
65.23323< 0.1%
 
227.6053< 0.1%
 
148.6313< 0.1%
 
138.8373< 0.1%
 
115.1962< 0.1%
 
206.9162< 0.1%
 
131.0782< 0.1%
 
256.8872< 0.1%
 
160.4212< 0.1%
 
108.7682< 0.1%
 
113.1172< 0.1%
 
132.872< 0.1%
 
138.1562< 0.1%
 
203.1722< 0.1%
 
83.71982< 0.1%
 
Other values (8777)907999.3%
 
ValueCountFrequency (%) 
31.97041< 0.1%
 
33.52251< 0.1%
 
34.13391< 0.1%
 
34.55282< 0.1%
 
35.48241< 0.1%
 
36.08491< 0.1%
 
37.21441< 0.1%
 
38.54461< 0.1%
 
39.05181< 0.1%
 
39.28691< 0.1%
 
ValueCountFrequency (%) 
528.891< 0.1%
 
470.4771< 0.1%
 
461.061< 0.1%
 
452.5541< 0.1%
 
449.51< 0.1%
 
444.0561< 0.1%
 
430.451< 0.1%
 
425.3731< 0.1%
 
424.5721< 0.1%
 
424.3091< 0.1%
 

err_malign
Real number (ℝ≥0)

Distinct count8831
Unique (%)96.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3992.9362562869014
Minimum1089.19
Maximum91983.7
Zeros0
Zeros (%)0.0%
Memory size71.6 KiB

Quantile statistics

Minimum1089.19
5-th percentile2187.3375
Q13177.6825
median3846.32
Q34664.5775
95-th percentile5777.33
Maximum91983.7
Range90894.51
Interquartile range (IQR)1486.895

Descriptive statistics

Standard deviation1780.672859
Coefficient of variation (CV)0.4459557441
Kurtosis758.4896593
Mean3992.936256
Median Absolute Deviation (MAD)739.25
Skewness18.40752405
Sum36519395
Variance3170795.832
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1773.4650.1%
 
4827.724< 0.1%
 
4628.024< 0.1%
 
4414.343< 0.1%
 
5709.893< 0.1%
 
4644.753< 0.1%
 
3069.563< 0.1%
 
4602.963< 0.1%
 
3034.983< 0.1%
 
3495.273< 0.1%
 
3011.53< 0.1%
 
4057.083< 0.1%
 
4037.823< 0.1%
 
3543.182< 0.1%
 
4211.512< 0.1%
 
5227.712< 0.1%
 
4865.392< 0.1%
 
4161.142< 0.1%
 
3837.052< 0.1%
 
2856.232< 0.1%
 
3243.292< 0.1%
 
3458.42< 0.1%
 
6220.492< 0.1%
 
3870.472< 0.1%
 
3286.632< 0.1%
 
Other values (8806)907999.3%
 
ValueCountFrequency (%) 
1089.191< 0.1%
 
1108.91< 0.1%
 
1137.381< 0.1%
 
11571< 0.1%
 
1176.531< 0.1%
 
1196.941< 0.1%
 
1215.421< 0.1%
 
1241.681< 0.1%
 
1245.951< 0.1%
 
1254.141< 0.1%
 
ValueCountFrequency (%) 
91983.71< 0.1%
 
53031.31< 0.1%
 
46427.711< 0.1%
 
22963.561< 0.1%
 
22838.071< 0.1%
 
19881.741< 0.1%
 
19424.181< 0.1%
 
18453.851< 0.1%
 
18034.91< 0.1%
 
18030.861< 0.1%
 

malign_penalty
Real number (ℝ≥0)

Distinct count315
Unique (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.84966105401269
Minimum0
Maximum340
Zeros2
Zeros (%)< 0.1%
Memory size71.6 KiB

Quantile statistics

Minimum0
5-th percentile12
Q131
median54
Q391
95-th percentile183.75
Maximum340
Range340
Interquartile range (IQR)60

Descriptive statistics

Standard deviation55.7853324
Coefficient of variation (CV)0.798648577
Kurtosis3.194113489
Mean69.84966105
Median Absolute Deviation (MAD)27
Skewness1.654867254
Sum638845
Variance3112.003312
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
321401.5%
 
411291.4%
 
401221.3%
 
331211.3%
 
431201.3%
 
171121.2%
 
291111.2%
 
341101.2%
 
361081.2%
 
271061.2%
 
351061.2%
 
391041.1%
 
301041.1%
 
581031.1%
 
441031.1%
 
221031.1%
 
121021.1%
 
541001.1%
 
261001.1%
 
31991.1%
 
45991.1%
 
23991.1%
 
37981.1%
 
21951.0%
 
18941.0%
 
Other values (290)645870.6%
 
ValueCountFrequency (%) 
02< 0.1%
 
1280.3%
 
2250.3%
 
3100.1%
 
4390.4%
 
5520.6%
 
6160.2%
 
7410.4%
 
8210.2%
 
9430.5%
 
ValueCountFrequency (%) 
3401< 0.1%
 
3371< 0.1%
 
3361< 0.1%
 
3341< 0.1%
 
3332< 0.1%
 
3322< 0.1%
 
3293< 0.1%
 
3281< 0.1%
 
3272< 0.1%
 
3262< 0.1%
 

damage_ratio
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count8641
Unique (%)94.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.461651924338504
Minimum15.228
Maximum46.5464
Zeros0
Zeros (%)0.0%
Memory size71.6 KiB

Quantile statistics

Minimum15.228
5-th percentile23.147375
Q130.290225
median35.24575
Q338.806075
95-th percentile43.4149
Maximum46.5464
Range31.3184
Interquartile range (IQR)8.51585

Descriptive statistics

Standard deviation5.972808003
Coefficient of variation (CV)0.1733175187
Kurtosis-0.2699688638
Mean34.46165192
Median Absolute Deviation (MAD)3.9416
Skewness-0.4626063578
Sum315186.2685
Variance35.67443544
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
46.5464931.0%
 
44.489250.1%
 
23.0154< 0.1%
 
39.76594< 0.1%
 
28.43084< 0.1%
 
36.85643< 0.1%
 
33.15483< 0.1%
 
27.9683< 0.1%
 
36.78483< 0.1%
 
34.88333< 0.1%
 
35.66153< 0.1%
 
37.85563< 0.1%
 
31.77473< 0.1%
 
19.85253< 0.1%
 
29.85543< 0.1%
 
32.88193< 0.1%
 
38.42723< 0.1%
 
39.22593< 0.1%
 
36.30383< 0.1%
 
37.26193< 0.1%
 
39.11133< 0.1%
 
29.75633< 0.1%
 
38.97413< 0.1%
 
37.67933< 0.1%
 
28.71443< 0.1%
 
Other values (8616)897698.1%
 
ValueCountFrequency (%) 
15.2281< 0.1%
 
15.64431< 0.1%
 
15.86881< 0.1%
 
16.48151< 0.1%
 
16.60891< 0.1%
 
16.62221< 0.1%
 
16.76841< 0.1%
 
16.81361< 0.1%
 
16.85551< 0.1%
 
16.89671< 0.1%
 
ValueCountFrequency (%) 
46.5464931.0%
 
46.29861< 0.1%
 
45.94431< 0.1%
 
45.90231< 0.1%
 
45.72521< 0.1%
 
45.72181< 0.1%
 
45.661< 0.1%
 
45.59651< 0.1%
 
45.58461< 0.1%
 
45.53181< 0.1%
 

tumor_size
Real number (ℝ≥0)

Distinct count6511
Unique (%)71.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.723348239667614
Minimum0.0
Maximum20.999
Zeros56
Zeros (%)0.6%
Memory size71.6 KiB

Quantile statistics

Minimum0
5-th percentile1.2325
Q12.32
median5.0605
Q313.336
95-th percentile18.61375
Maximum20.999
Range20.999
Interquartile range (IQR)11.016

Descriptive statistics

Standard deviation6.086852455
Coefficient of variation (CV)0.7881105792
Kurtosis-1.123046703
Mean7.72334824
Median Absolute Deviation (MAD)3.474
Skewness0.5772015728
Sum70637.743
Variance37.0497728
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0560.6%
 
2.00680.1%
 
1.89670.1%
 
2.24370.1%
 
1.52970.1%
 
2.01270.1%
 
2.8570.1%
 
1.64560.1%
 
1.61160.1%
 
2.89160.1%
 
1.84960.1%
 
2.22760.1%
 
1.8260.1%
 
1.7360.1%
 
15.85260.1%
 
2.57350.1%
 
2.7450.1%
 
1.75550.1%
 
1.89850.1%
 
1.65150.1%
 
2.11250.1%
 
1.85650.1%
 
1.75650.1%
 
1.78150.1%
 
2.08550.1%
 
Other values (6486)894997.8%
 
ValueCountFrequency (%) 
0560.6%
 
0.4011< 0.1%
 
0.4291< 0.1%
 
0.4313< 0.1%
 
0.4371< 0.1%
 
0.4532< 0.1%
 
0.4551< 0.1%
 
0.4721< 0.1%
 
0.4811< 0.1%
 
0.4841< 0.1%
 
ValueCountFrequency (%) 
20.9991< 0.1%
 
20.9851< 0.1%
 
20.9831< 0.1%
 
20.971< 0.1%
 
20.9591< 0.1%
 
20.9541< 0.1%
 
20.9481< 0.1%
 
20.9441< 0.1%
 
20.9411< 0.1%
 
20.9391< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

mass_npeasize_npearmalign_ratiodamage_sizeexposed_areastd_dev_malignerr_malignmalign_penaltydamage_ratiotumor_size
06930.902919.020.4211651.82989.888294e+05109.48702758.767239.362014.103
115635.704879.360.31206223.55002.058426e+06248.88105952.5324022.02532.648
210376.202613.880.25191127.33701.434676e+06160.09304635.267329.99631.688
313093.804510.060.34444155.44001.812195e+06173.01505273.873228.13543.796
47545.212882.360.3820185.12371.043918e+06124.41403263.355735.020018.023
56851.092195.180.3204172.82839.484467e+0597.18813688.574036.34811.709
67069.241886.090.2668058.26861.024783e+0676.94473168.442239.99619.937
716446.105115.450.31104204.97402.167355e+06265.88106425.9124222.95332.510
86814.732043.780.2999090.08899.669835e+0599.62863428.542737.164212.568
95049.75949.820.1880941.29576.906229e+0570.41422734.592741.136613.428

Last rows

mass_npeasize_npearmalign_ratiodamage_sizeexposed_areastd_dev_malignerr_malignmalign_penaltydamage_ratiotumor_size
913610704.203134.650.2928499.60311.455340e+06135.78405941.133730.130115.570
91378584.662601.470.3030396.49281.215774e+06112.32103605.019436.71632.123
913820392.007400.070.36289185.29902.783620e+06277.52004502.8212729.68168.639
913910363.602083.950.2010887.80541.447825e+06122.87103988.7815631.88854.139
914011679.402603.310.22289154.51901.650030e+06169.78004607.995529.97130.915
91417250.253120.630.4304182.04109.794768e+05118.77103370.245337.026013.127
914210145.003544.900.3494290.14031.374393e+06154.02705025.503031.056517.091
91438086.101621.650.2005478.51181.134257e+06104.28403804.981334.27391.971
914414418.906373.710.4420384.06651.955398e+06246.445019881.743934.588517.749
91456852.611584.640.2312451.32119.559976e+0580.65433073.512837.893914.103

Duplicate rows

Most frequent

mass_npeasize_npearmalign_ratiodamage_sizeexposed_areastd_dev_malignerr_malignmalign_penaltydamage_ratiotumor_sizecount
14000.261087.130.2717633.73205.694944e+0546.03901773.461244.48921.7304
6712958.403493.280.26957162.59401.798705e+06211.93504827.725028.43082.7404
125811.821053.230.1812252.55917.999772e+0565.23323034.982439.76594.0703
196411.312061.500.3215444.71488.318863e+0590.16263011.505538.97413.4293
237034.322238.050.3181676.43929.631350e+05111.30103069.568239.11131.5393
6913475.404814.930.35731168.55001.877844e+06227.60504644.759929.75637.1903
03944.701086.380.2754034.59155.580284e+0545.41321780.161241.19042.0062
24025.811206.570.2997032.73815.553452e+0551.31781399.101944.31501.4472
34045.881385.750.3425035.11625.581583e+0550.86891388.971744.29191.9042
44914.201260.660.2565338.86676.855367e+0558.61291894.312943.370412.1352